Parallel Va-file
نویسنده
چکیده
Similarity search is one of the typical query type for multimedia retrieval, data mining and decision support systems. Many similarity measures transform objects into points in a high-dimensional vector space and deene similarity of two objects with respect to their distance in the vector space. Data-partitioning index methods for such spaces like R-tree or X-tree are known to deteriorate with the increased number of dimensions. A newer approach, the so called VA-File, overcomes the diiculties of high dimensionality. However, the VA-File search method is linear in the number of objects. In this paper, we investigate parallelization issues of that linear process either by using multiple disks assigned to a single processor, or by using a cluster of workstation. Performance measurements of parallel VA-File in a multiple disk environment are very promising and show almost linear speed-up with the increased number of disks. Distributed VA-File search in a cluster of workstation not only aims at reduction of I/O and CPU cost, but also at exhausting unused capacities in the cluster.
منابع مشابه
Interactive-Time Similarity Search for Large Image Collections Using Parallel VA-Files
Nearest-neighbor search (NN-search) plays a key role for content-based retrieval. As a first contribution, this article shows that NN-search is a meaningful implementation of similarity search, even if features are high-dimensional. But NN-search over high-dimensional features is of linear complexity and query response times have not been satisfactory for large collections of multimedia objects...
متن کاملA Simple Vector-Approximation File for Similarity Search in High-Dimensional Vector Spaces
Many similarity measures in multimedia databases and decision-support systems are based on underlying vector spaces of high dimensionality. Data-partitioning index methods for such spaces (for example, grid les, R-trees, and their variants) generally work well for low-dimensional spaces, but perform poorly as dimensionality increases. This problem has become known as thèdimensional curse'. This...
متن کاملFast Evaluation Techniques for Complex Similarity Queries
Complex similarity queries, i.e., multi-feature multi-object queries, are needed to express the information need of a user against a large multimedia repository. Even if a user initially issues a single-object query over one feature, a system with relevance feedback will automatically generate a complex similarity query. Relevance feedback is only useful if response times are interactive. There...
متن کاملEfficient k-NN Search on Streaming Data Series
Data streams are common in many recent applications, e.g. stock quotes, e-commerce data, system logs, network traffic management, etc. Compared with traditional databases, streaming databases pose new challenges for query processing due to the streaming nature of data which constantly changes over time. Index structures have been effectively employed in traditional databases to improve the quer...
متن کاملAccuracy and completeness of mortality data in the Department of Veterans Affairs
BACKGROUND One of the national mortality databases in the U.S. is the Beneficiary Identification and Record Locator Subsystem (BIRLS) Death File that contains death dates of those who have received any benefits from the Department of Veterans Affairs (VA). The completeness of this database was shown to vary widely from cohort to cohort in previous studies. Three other sources of death dates are...
متن کامل